Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: brewster Date: May 4, 2008 7:46am
Forum: texts Subject: Re: How do Microsoft digitize texts?

Microsoft digitizes books many different ways. One way is to sponsor digitization done by the Internet Archive.

Those books are photographed in color and then put through a set of processing steps, including Optical Character Recognition, to create searchable PDF's, DJVU, and a flip book.

See the Scribe Bookscanning system for more information.

I hope this answers your questions.


Reply to this post
Reply [edit]

Poster: stringybark Date: May 4, 2008 11:39pm
Forum: texts Subject: Re: How do Microsoft (make that Scribe) digitize texts?

Thanks for that lead, Brewster, I found a couple of things (oddly, mentioning a guy named Brewster):
plus a really useless Wikipedia entry.

So its Scribe that does the work, Microsoft pays the money and gets watermarking rights. But the really intriguing thing is the way the image is processed - it seems to re-composite the photographed image as a number of layers, and the compression is definitely not jpeg.

Should be more of this form of digitization - excellent results, definitely allows access to material practically impossible to get hold of. Hence it would be excellent of the technological details are well known. That would allow lots of corridor talk, leading maybe to more players on the Internet Archive side.

Reply to this post
Reply [edit]

Poster: agharbeia Date: Jul 14, 2008 7:05am
Forum: texts Subject: Re: How do Microsoft digitize texts?

Are they still digitising?

I though Microsoft have wound their digitisation initiative down.

This post was modified by agharbeia on 2008-07-14 14:05:43